Pii: S0895-4356(01)00341-9

نویسندگان

  • Ewout W. Steyerberg
  • Frank E. Harrell
  • Gerard J. J. M. Borsboom
  • Yvonne Vergouwe
  • Dik F. Habbema
چکیده

The performance of a predictive model is overestimated when simply determined on the sample of subjects that was used to construct the model. Several internal validation methods are available that aim to provide a more accurate estimate of model performance in new subjects. We evaluated several variants of split-sample, cross-validation and bootstrapping methods with a logistic regression model that included eight predictors for 30-day mortality after an acute myocardial infarction. Random samples with a size between n 572 and n 9165 were drawn from a large data set (GUSTO-I; n 40,830; 2851 deaths) to reflect modeling in data sets with between 5 and 80 events per variable. Independent performance was determined on the remaining subjects. Performance measures included discriminative ability, calibration and overall accuracy. We found that split-sample analyses gave overly pessimistic estimates of performance, with large variability. Cross-validation on 10% of the sample had low bias and low variability, but was not suitable for all performance measures. Internal validity could best be estimated with bootstrapping, which provided stable estimates with low bias. We conclude that split-sample validation is inefficient, and recommend bootstrapping for estimation of internal validity of a predictive logistic regression model. © 2001 Elsevier Science Inc. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pii: S0895-4356(00)00344-9

A critique is presented of the use of tree-based partitioning algorithms to formulate classification rules and identify subgroups from clinical and epidemiological data. It is argued that the methods have a number of limitations, despite their popularity and apparent closeness to clinical reasoning processes. The issue of redundancy in tree-derived decision rules is discussed. Simple rules may ...

متن کامل

Pii: S0895-4356(02)00504-8

An almost endless number of observations and experiments have effectively falsified the hypothesis that dietary cholesterol and fats, and a high cholesterol level play a role in the causation of atherosclerosis and cardiovascular disease. The hypothesis is maintained because allegedly supportive, but insignificant findings, are inflated, and because most contradictory results are misinterpreted...

متن کامل

Pii: S0895-4356(01)00372-9

Logistic regression (LR) is a widely used multivariable method for modeling dichotomous outcomes. This article examines use and reporting of LR in the medical literature by comprehensively assessing its use in a selected area of medical study. Medline, followed by bibliography searches, identified 15 peer-reviewed English-language articles with original data, employing LR, published between 198...

متن کامل

Pii: S0895-4356(01)00517-0

This report describes the principal methods used in the development, conduct, and analysis of the research study “Health Assessment of Persian Gulf War Veterans from Iowa” (Iowa Gulf War Study). The methods presented include an outline of the organizational structure, study timeline, hypotheses, outcome definitions, and study design. Adhering to a strict timeline, the study protocol and instrum...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001